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Description 

The present invention relates generally to the manipulation of genetic materials and, more particularly, 
to the manufacture of specific DNA sequences useful in recombinant procedures to secure the production 
5 of proteins of interest. 

Genetic materials may be broadly defined as those chemical substances which program for and guide 
the manufacture of constituents of cells and viruses and direct the responses of cells and viruses. A long 
chain polymeric substance known as deoxyribonucleic acid (DNA) comprises the genetic material of all 
living cells and viruses except for certain viruses which are programmed by ribonucleic acids (RNA). The 

70 repeating units in DNA polymers are four different nucleotides, each of which consists of either a purine 
(adenine or guanine) or a pyrimidine (thymine or cytosine) bound to a deoxyribose sugar to which a 
phosphate group is attached. Attachment of nucleotides in linear polymeric form is by means of fusion of 
the 5' phosphate of one nucleotide to the 3* hydroxyl group of another. Functional DNA occurs in the form 
of stable double stranded associations of single strands of nucleotides (known as deoxyoligonucleotides), 

75 which associations occur by means of hydrogen bonding between purine and pyrimidine bases [i.e., 
"complementary" associations existing either between adenine (A) and thymine (T) or guanine (G) and 
cytosine (C)]. By convention, nucleotides are referred to by the names of their constituent purine or 
pyrimidine bases, and the complementary associations of nucleotides in double stranded DNA (i.e., A-T and 
G-C) are. referred to as "base pairs". Ribonucleic acid is a polynucleotide comprising adenine, guanine, 

20 cytosine and uracil (U), rather than thymine, bound to ribose and a phosphate group. 

Most briefly put, the programming function of DNA is generally effected through a process, wherein 
specific DNA nucleotide sequences (genes) are "transcribed" into relatively unstable messenger RNA 
(mRNA) polymers. The mRNA, in turn, serves as a template for the formation of structural, regulatory and 
catalytic proteins from amino acids. This translation process involves the operations of small RNA strands 

25 (tRNA) which transport and align individual amino acids along the mRNA strand to allow for formation of 
polypeptides in proper amino acid sequences. The mRNA "message", derived from DNA and providing the. 
basis for the tRNA supply and orientation of any given one of the twenty amino acids for polypeptide 
"expression", is in the form of triplet "codons" - sequential groupings of three nucleotide bases. In one 
sense, the formation of a protein is the ultimate form of "expression" of the programmed genetic message 

so provided by the nucleotide sequence of a gene. 

Certain DNA sequences which usually "precede" a gene in a DNA polymer provide a site for initiation 
of the transcription into mRNA. These are referred to as "promoter" sequences. Other DNA sequences, 
also usually "upstream" of (i.e., preceding) a gene in a given DNA polymer, bind proteins that determine 
the frequency (or rate) of transcription initiation. These other seqeunces are referred to as "regulator" 

35 sequences. Thus, sequences which precede a selected gene (or series of genes) in a functional DNA 
polymer and which operate to determine whether the transcription (and eventual expression) of a gene will 
take place are collectively referred to as "promoter/regulator" or "control" DNA sequences. DNA sequences 
which "follow" a gene in a DNA polymer and provide a signal for termination of the transcription into mRNA 
are referred to as "terminator" sequences. 

40 A focus of microbiological processing for nearly the last decade has been the attempt to manufacture 
industrially and pharmaceutical^ significant substances using organisms which do not intially have geneti- 
cally coded information concerning the desired product included in their DNA. Simply put, a gene that 
specifies the structure of a product is either isolated from a "donor" organism or chemically synthesized 
and then stably introduced into another organism which is preferably a self-replicating unicellular microor- 

45 ganism. Once this is done, the existing machinery for gene expression in the "transformed" host cells 
operates to construct the desired product. 

The art is rich in patent and literature publications relating to "recombinant DNA" methodologies for the 
isolation, synthesis, purification and amplification of genetic materials for use in the transformation of 
selected host organisms. U.S. Letters Patent No. 4,237,224 to Cohen, et ai., for example, relates to 

50 transformation of procaryotic unicellular host organisms with "hybrid" viral or circular plasmid DNA which 
includes selected exogenous DNA sequences. The procedures of the Cohen, et al. patent first involve 
manufacture of a transformation vector by enzymatically cleaving viral or circular plasmid DNA to form 
linear DNA strands. Selected foreign DNA strands are also prepared in linear form through use of similar 
enzymes. The linear viral or plasmid DNA is incubated with the foreign DNA in the presence of ligating 

55 enzymes capable of effecting a restoration process and "hybrid" vectors are formed which include the 
selected foreign DNA segment "spliced" into the viral or circular DNA plasmid. 

Transformation of compatible unicellular host organisms with the hybrid vector results in the formation 
of multiple copies of the foreign DNA in the host cell population. In some instances, the desired result is 
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simply the amplification of the foreign DNA and the "product" harvested is DNA. More frequently, the goal 
of transformation is the expression by the host cells of the foreign DNA in the form of large scale synthesis 
of isolatable quantities of commercially significant protein or polypeptide fragments coded for by the foreign 
DNA. See also, e.g., U.S. Letters Patent Nos. 4,269,731 (to Shine), 4,273,875 (to Manis) and 4,293,652 (to 
5 Cohen), 

The success -of procedures such as described in the Cohen, et al. patent is due in large part to the 
ready availability of "restriction endonuclease" enyzmes which facilitate the site-specific cleavage of both 
the unhybridized DNA vector and, e.g., eukaryotic DNA strands containing the foreign sequences of interest. 
Cleavage in a manner providing for the formation of single stranded complementary "ends" on the double 

70 stranded linear DNA strands greatly enhances the likelihood of functional incorporation of the foreign DNA 
into the vector upon "ligating" enzyme treatment. A large number of such restriction endonuclease 
enzymes are currently commercially available [See, e.g., "BRL Restriction Endonuclease Reference Chart" 
appearing in the "W82 Catalog" of Bethesda Research Laboratories, Inc., Gaithersburg, Maryland.] 
Verification of hybrid formation is facilitated by chromatographic techniques which can, for example, 

75 distinguish the hybrid plasmids from non-hybrids on the basis of molecular weight. Other useful verification 
techniques involve radioactive DNA hybridization. 

Another manipulative "tool" largely responsible for successes in transformation of procaryotic cells is 
the use of selectable "marker" gene sequences. Briefly put, hybrid vectors are employed which contain, in 
addition to the desired foreign DNA, one or more DNA sequences which code for expression of a 

20 phenotypic trait capable of distinguishing transformed from non-transformed host cells. Typical marker gene 
sequences are those which allow a transformed procaryotic cell to survive and propagate in a culture 
medium containing metals, antibiotics, and like components which would kill or severely inhibit propagation 
of non-transformed host cells. 

Successful expression of an exogenous gene in a transformed host microorganism depends to a great 

25 extent on incorporation of the gene into a transformation vector with a suitable promoter/regulator region 
present to insure transcription of the gene into mRNA and other signals which insure translation of the 
mRNA message into protein (e.g., ribosome binding sites). It is not often the case that the "original" 
promoter/regulator region of a gene will allow for high levels of expression in the new host. Consequently, 
the gene to be inserted must either be fitted with a new, host-accommodated transcription and translation 

30 regulating DNA sequence prior to insertion or it must be inserted at a site where it will come under the 
control of existing transcription and translation signals in the vector DNA. 

It is frequently the case that the insertion of an exogenous gene into, e.g., a circular DNA plasmid 
vector, is performed at a site either immediately following an extant transcription and translation signal or 
within an existing plasmid-borne gene coding for a rather large protein which is the subject of high degrees 

35 of expression in the host. In the latter case, the host's expression of the "fusion gene" so formed results in 
high levels of production of a "fusion protein" including the desired protein sequence (e.g., as an 
intermediate segment which can be isolated by chemical cleavage of large protein). Such procedures not 
only insure desired regulation and high levels of expression of the exogenous gene product but also result 
in a degree of protection of the desired protein product from attack by proteases endogenous to the host. 

40 Further, depending on the host organism, such procedures may allow for a kind of "piggyback" transporta- 
tion of the desired protein from the host cells into the cell culture medium, eliminating the need to destroy 
host cells for the purpose of isolating the desired product. 

While the foregoing generalized descriptions of published recombinant DNA methodologies may make 
the processes appear to be rather straightforward, easily performed and readily verified, it is actually the 

45 case that the DNA sequence manipulations involved are quite painstakingly difficult to perform and almost 
invariably characterized by very low yields of desired products. 

As an example, the initial "preparation" of a gene for insertion into a vector to be used in transformation 
of a host microorganism can be an enormously difficult process, especially where the gene to be expressed 
is endogenous to a higher organism such as man. One laborious procedure practiced in the art is the 

so systematic cloning into recombinant plasmids of the total DNA genome, of the "donor" cells, generating 
immense "libraries" of transformed cells carrying random DNA sequence fragments which must be 
individually tested for expression of a product of interest. According to another procedure, total mRNA is 
isolated from high expression donor cells (presumptively containing multiple copies of mRNA coded for the 
product of interest), first "copied" into single stranded cDNA with reverse transcriptase enzymes, then into 

55 double stranded form with polymerase, and cloned. The procedure again generates a library of transformed 
cells somewhat smaller than a total genome library which may include the desired gene copies free of non- 
transcribed "introns" which can significantly interfere with expression by a host microorganism. The above- 
noted time-consuming gene isolation procedures were in fact employed in published recombinant DNA 
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procedures for obtaining microorganism expression of several proteins, including rat proinsulin [Ullrich, et 
a!., Science , 196 , pp. 1313-1318 (1977)], human fibroblast interferon [Goedell, et ah, Nucleic Acids 
Research , 8, pp. 4087-4094 (1980)], mouse 0-endorphin [Shine, et al., Nature , 285 , pp. 456-461 (1980)] and 
human leukocyte interferon [Goedell, et al., Nature , 287 , pp. 411-416 (1980); and Goedell, et al., Nature , 

5 290, pp. 20-26 (1981)]. 

Whenever possible, the partial or total manufacture of genes of interest from nucleotide bases 
constitutes a much preferred procedure for preparation of genes to be used in recombinant DNA methods. 
A requirement for such manufacture is, of course, knowledge of the correct amino acid sequence of the 
desired polypeptide. With this information in hand, a generative DNA sequence code for the protein (i.e., a 

70 properly ordered series of base triplet codons) can be planned and a corresponding synthetic, double 
stranded DNA segment can be constructed. A combination of manufacturing and cDNA synthetic meth- 
odologies is reported to have been employed in the generation of a gene for human growth hormone. 
Specifically, a manufactured linear double stranded DNA sequence of 72 nucleotide base pairs (comprising 
codons specifying the first 24 amino acids of the. desired 191 amino acid polypeptide) was ligated to a 

75 cDNA-derived double strand coding for amino acids Nos. 25-191 and inserted in a modified pBR322 
plasmid at a locus controlled by a lac promotor/regulator sequence [Goedell, et al., Nature , 281 , pp. 544- 
548(1981)]. 

Completely synthetic procedures have been employed for the manufacture of genes coding for 
relatively "short" biologically functional polypeptides, such as human somatostatin (14 amino acids) and 

20 human insulin (2 polypeptide chains of 21 and 30 amino acids, respectively). 

In the somatostatin gene preparative procedure [Itakura, et al., Science , 198 , pp. 1056-1063 (1977)] a 
52 base pair gene was constructed wherein 42 base pairs represented the codons specifying the required 
14 amino acids and an additional 10 base pairs were added to permit formation of "sticky-end" single 
stranded terminal regions employed for ligating the structural gene into a microorganism transformation 

25 vector. Specifically, the gene was inserted close to the end of a j3-galactosidase enzyme gene and the 
resultant fusion gene was expressed as a fusion protein from which somatostatin was isolated by cyanogen 
bromide cleavage. Manufacture of the human insulin gene, as noted above, involved preparation of genes 
coding for a 21 amino acid chain and for a 30 amino acid chain. Eighteen deoxyoligonucleotide fragments 
were combined to make the gene for the longer chain, and eleven fragments were joined into a gene for the 

30 shorter chain. Each gene was employed to form a fusion gene with a tf-galactosidase gene and the 
individually expressed polypeptide chains were enzymatically isolated and linked to form complete insulin 
molecules. [Goedell, et al., Proc. Nat. Acad. Sci. U.S.A ., 76, pp. 106-110 (1979).] 

In each of the above procedures, deoxyoligonucleotide segments were prepared, and then sequentially 
ligated according to the following general procedure. [See, e.g., Agarwal, et al., Nature , 227, pp. 1-7 (1970) 

35 and Khorana, Science , 203 , pp. 614-675 (1979)]. An initial "top" (i.e., 5*-3* polarity) deoxyoligonucleotide 
segment is enzymatically joined to a second "top" segment. Alignment of these two "top" strands is made 
possible using a "bottom" (i.e., 3* to 5' polarity) strand having a base sequence complementary to half of 
the first top strand and half of the second top strand. After joining, the uncomplemented bases of the top 
strands "protrude" from the duplex portion formed. A second bottom strand is added which includes the 

40 five or six base complement of a protruding top strand, plus an additional five or six bases which then 
protrude as a bottom single stranded portion. The two bottom strands are then joined. Such sequential 
additions are continued until a complete gene sequence is developed, with the total procedure being very 
time-consuming and highly inefficient. 

The time-consuming characteristics of such methods for total gene synthesis are exemplified by reports 

45 that three months 1 work by at least four investigators was needed to perform the assembly of the two 
"short", insulin genes previously referred to. Further, while only relatively small quantities of any manufac- 
tured gene are needed for success of vector insertion, the above synthetic procedures have such poor 
overall yields (on the order of 20% per ligation) that the eventual isolation of even minute quantities of a 
selected short gene is by no means guaranteed with even the most scrupulous adherence to prescribed 

so methods. The maximum length gene which can be synthesized is clearly limited by the efficiency with 
which the individual short segments can be joined. If n such ligation reactions are required and the yield of 
each such reaction is y, the quantity of correctly synthesized genetic material obtained will be proportional 
to v n . Since this relationship is expotential in nature, even a small increase in the yield per ligation reaction 
will result in a substantial increase in the length of the largest gene that may be synthesized. 

55 Inefficiencies in the above-noted methodology are due in large part to the formation of undesired 
intermediate products. As an example, in an initial reaction forming annealed top strands associated with a 
bottom, "template" strand, the desired reaction may be, 
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but the actual products obtained may be 
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or the like. Further, the longer the individual deoxyolidonucleotides are, the more likely it is that they will 

25 form thermodynamically stable self-associations such as "hairpins" or aggregations. 

Proposals for increasing synthetic efficiency have not been forthcoming and it was recently reported 
that, "With the methods now available, however, it is not economically practical to synthesize genes for 
peptides longer than about 30 amino acid units, and many clinically important proteins are much longer". 
[Aharonowitz, et al. Scientific American , 245 , No. 3, pp. 140-152, at p. 151 (1981).] 

30 An illustration of the "economic practicalities" involved in large gene synthesis is provided by the 
recent publication of. "successful" efforts in the total synthesis of a human leukocyte interferon gene. [Edge, 
et aL, Nature , 292 , pp. 756-782 (1981).] Briefly summarized, 67 different deoxyoligonucleotides containing 
about 15 bases were synthesized and joined in the "50 percent overlap" procedure of the type noted above 
to form eleven short duplexes. These, in turn were assembled into four longer duplexes which were 

35 eventually joined to provide a 514 base pair gene coding for the 166 amino acid protein. The procedure, 
which the authors characterize as "rapid", is reliably estimated to have consumed nearly a year's effort by 
five workers and the efficiency of the assembly strategy was clearly quite poor. It may be noted, for 
example, that while 40 pmole of each of the starting 67 deoxyoligonucleotides was prepared and employed 
to form the eleven intermediate-sized duplexes, by the time assembly of the four large duplexes was 

40 achieved, a yield of only about 0.01 pmole of the longer duplexes could be obtained for use in final 
assembly of the whole gene. 

Another aspect of the practice of recombinant DNA techniques for the expression, by microorganisms, 
of proteins of industrial and pharmaceutical interest is the phenomenon of "codon preference". While it was 
earlier noted that the existing machinery for gene expression in genetically transformed host cells will 

45 "operate" to construct a given desired product, levels of expression attained in a microorganism can be 
subject to wide variation, depending in part on specific alternative forms of the amino acid-specifying 
genetic code present in an inserted exogenous gene. A "triplet" codon of four possible nucleotide bases 
can exist in 64 variant forms. That these forms provide the message for only 20 different amino acids (as 
well as transcription initiation and termination) means that some amino acids can be coded for by more than 

so one codon. Indeed, some amino acids have as many as six "redundant", alternative codons while some 
others have a single, required codon. For reasons not completely understood, alternative codons are not at 
all uniformly present in the endogenous DNA of differing types of cells and there appears to exist a variable 
natural hierarchy or "preference" for certain codons in certain types of cells. 

As one example, the amino acid leucine is specified by any of six DNA codons including CTA, CTC, 

55 CTG, CTT, TTA, and TTG (which correspond, respectively, to the mRNA codons, CUA, CUC, CUG, CUU, 
UUA and UUG). Exhaustive analysis of genome codon frequencies for microorganisms has revealed 
endogenous DNA of E. coli bacteria most commonly contains the CTG leucine-specifying codon, while the 
DNA of yeasts and slime molds most commonly includes a TTA leucine-specifying codon. In view of this 
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hierarchy, it is generally held that the likelihood of obtaining high levels of expression of a leucine-rich 
polypeptide by an E. coli host will depend to some extent on the frequency of codon use. For example, a 
gene rich in TTA codons will in all probability be poorly expressed in E. coli, whereas a CTG rich gene will 
probably highly express the polypeptide. In a like manner, when yeast cells are the projected transformation 
5 host cells for expression of a leucine-rich polypeptide, a preferred codon for use in an inserted DNA would 
be TTA. See, e.g., Grantham, et al. Nucleic Acids Research , 8, pp. r49-r62 (1980); Grantham, et a!., Nucleic 
Acids Research , 8, pp. 1883-1912 (1980); and, Grantham, et al., Nucleic Acids Research , 9, pp. r43-r74 
(1981). 

The implications of codon preference phenomena on recombinant DNA techniques are manifest, and 
70 the phenomenon may serve to explain many prior failures to achieve high expression levels for exogenous 
genes in successfully transformed host organisms - a less "preferred" codon may be repeatedly present in 
the inserted gene and the host cell machinery for expression may not operate as efficiently. This 
phenomenon directs the conclusion that wholly manufactured genes which have been designed to include a 
projected host cell's preferred codons provide a preferred form of foreign genetic material for practice of 
75 recombinant DNA techniques. In this context, the absence of procedures for rapid and efficient total gene 
manufacture which would permit codon selection is seen to constitute an even more serious roadblock to 
advances in the art 

Of substantial interest to the background of the present invention is the state of the art with regard to 
the preparation and use of a class of biologically active substances, the interferons (IFNs). Interferons are 

20 secreted proteins having fairly well-defined antiviral, antitumor and immunomodulatory characteristics. See, 
e.g., Gray, et al., Nature , 295 , pp. 503-508 (1982) and Edge, et aL, supra , and references cites therein. 

On the basis of antigenicity and biological and chemical properties, human interferons have been 
grouped into three major classes: IFN-a (leukocyte), 1FN-0 (fibroblast) and IFN-7 (immune). Considerable 
information has accumulated on the structures and properties of the virus-induced acid-stable interferons 

25 (IFN-a and 0) (leukocyte interferon has been referred to as "LelFN" and "IFN-a" and is also generally 
called "a-IFN"). These have been purified to homogeneity and at least partial amino acid sequences have 
been determined. Analyses of cloned cDNA and gene sequences for IFN-jS-i and the IFN-a multigene family 
have permitted the deduction of the complete amino acid sequences of many of the interferons. In addition, 
efficient synthesis of IFN-£i and several IFN-qs in E. coli , and IFN-ai, in yeast, have now made possible 

30 the purification of large quantities of these proteins in biologically active form. 

Much less information is available concerning the structure and properties of IFN-7, an interferon 
generally produced in cultures of lymphocytes exposed to various mitogenic stimuli. It is acid labile and 
does not cross-react with antisera prepared against IFN-o or IFN-j8. A broad range of biological activities 
have been attributed to IFN-7 including potentiation of the antiviral activities of IFN-a and -0, from which it 

35 differs in terms of its virus and cell specificities and the antiviral mechanisms induced. 

The above-noted wide variations in biological activities of various interferon types makes the construc- 
tion of synthetic polypeptide analogs of the interferons of paramount significance to the full development of 
the therapeutic potential of this class of compounds. Despite the advantages in isolation of quantities of 
interferons which have been provided by recombinant DNA techniques to date, practitioners in this field 

40 have not been able to address the matter of preparation of synthetic polypeptide analogs of the interferons 
with any significant degree of success. 

Put another way, the work of Gray, et al., supra , in the isolation of a gene coding for IFN- 7 and the 
extensive labors of Edge, et al., supra , in providing. a wholly manufactured lFN-cn gene provide only genetic 
materials for expression of single, very precisely defined, polypeptide sequences. There exist no proce- 
ss dures (except, possibly, for site specific mutagenesis) which would permit microbial expression of large 
quantities of human IFN-y analogs which differed from the "authentic" polypeptide in terms of the identity 
or location of even a single amino acid. In a like manner, preparation of an IFN-ai analog which differed by 
one amino acid from the polypeptide prepared by Edge, et al., supra , would appear to require an additional 
year of labor in constructing a whole new gene which varied in terms of a single tripiet codon. No means is 

50 readily available for the excision of a fragment of the subject gene and replacement with a fragment 
including the coding information for a variant polypeptide sequence. Further, modification of the reported 
cDNA-derived and manufactured DNA sequences to vary codon usage is not an available "option". 

Indeed, the only report of the preparation of variant interferon polypeptide species by recombinant DNA 
techniques has been in the context of preparation and expression of "hybrids" of human genes for !FN-cn 

55 and IFN-a 2 [Week, et al., Nucleic Acids Research , 9, pp. 6153-6168 (1981) and Streuli, et al., Proc. Nat. 
Acad. Sci. U.S.A ., 78, pp. 2848-2852 (1981)]. The hybrids obtained consisted of the four possible 
combinations of gene fragments developed upon finding that two of the eight human (cDNA-derived) genes 
fortuitously included only once within the sequence, base sequences corresponding to the restriction 
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endonuclease cleavage sites for the bacterial endonucieases, Pvull and BgllL 

There exists, therefore, a substantial need in the art for more efficient procedures for the total synthesis 
from nucleotide bases of manufactured DNA sequences coding for large polypeptides such as the 
interferons. There additionally exists a need for synthetic methods which will allow for the rapid construction 
5 of variant forms of synthetic sequences such as will permit the microbial expression of synthetic 
polypeptides which vary from naturally occurring forms in terms of the identity and/or position of one or 
more selected amino acids. 

BRIEF SUMMARY 

70 

The present invention provides novel, rapid and highly efficient procedures for the total synthesis of 
linear, double stranded DNA sequences in excess of about 200 nucleotide base pairs in length, which 
sequences may comprise entire structural genes capable of directing the synthesis of a wide variety of 
polypeptides of interest. 

75 According to the invention, linear, double stranded DNA sequences of a length in excess of about 200 
base pairs and coding for expression of a predetermined continuous sequence of amino acids within a 
selected host microorganism transformed by a selected DNA vector including the sequence, are syn- 
thesized by a method comprising: 

(a) preparing two or more different, subunit, linear, double stranded DNA sequences of about 100 or 
20 more base pairs in length for assembly in a selected assembly vector, 

each different subunit DNA sequence prepared comprising a series of nucleotide base codons 
coding for a different continuous portion of said predetermined sequence of amino acids to be 
expressed, 

one terminal region of a first of said subunits comprising a portion of a base sequence which 
25 provides a recognition site for cleavage by a first restriction endonuclease, which recognition site is 
entirely present either once or not at all in said selected assembly vector upon insertion of the subunit 
therein, 

one terminal region of a second of said subunits comprising a portion of a base sequence which 
provides a recognition site for cleavage by a second restriction endonuclease other than said first 
30 endonuclease, which recognition site is entirely present once or not at all in said selected assembly 
vector upon insertion of the subunit therein, 

at least one-half of all remaining terminal regions of subunits comprising a portion of a recognition 
site (preferably a palindromic six base recognition site) for cleavage by a restriction endonuclease other 
than said first and second endonucieases, which recognition site is entirely present once and only once 
35 in said selected assembly vector after insertion of all subunits thereinto; and 

(b) serially inserting each of said subunit DNA sequences prepared in step (a) into the selected 
assembly vector and effecting the biological amplification of the assembly vector subsequent to each 
insertion, thereby to form a DNA vector including the desired DNA sequence coding for the predeter- 
mined continuous amino acid sequence and wherein the desired DNA sequence assembled includes at 

40 least one unique, preferably palindromic six base, recognition site for restriction endonuclease cleavage 
at an intermediate position therein. 

The above general method preferably further includes the step of isolating the desired DNA sequence 
from the assembly vector preferably to provide one of the class of novel manufactured DNA sequences 
having at least one unique palindromic six base recognition site for restriction endonuclease cleavage at an 

45 intermediate position therein. A sequence so isolated may then be inserted in a different, "expression" 
vector and direct expression of the desired polypeptide by a microorganism which is the same as or 
different from that in which the assembly vector is amplified. In other preferred embodiments of the method: 
at least three different subunit DNA sequences are prepared in step (a) and serially inserted into said 
selected assembly vector in step (b) and the desired manufactured DNA sequence obtained includes at 

so least two unique palindromic six base recognition sites for restriction endonuclease cleavage at intermediate 
positions therein; the DNA sequence synthesized comprises an entire structural gene coding for a 
biologically active polypeptide; and, in the DNA sequence manufactured, the sequence of nucleotide bases 
includes one or more codons selected, from among alternative codons specifying the same amino acid, on 
the basis of preferential expression characteristics of the codon in said selected host microorganism. 

55 Novel products of the invention include manufactured, linear, double stranded DNA sequences of a 
length in excess of about 200 base pairs and coding for the expression of a predetermined continuous 
sequence of amino acids by a selected host microorganism transformed with a selected DNA vector 
including the sequence, characterized by having at least one unique palindromic six base recognition site 
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for restriction endonuclease cleavage at an intermediate position therein. Also included are polypeptide 
products of the expression by an organism of such manufactured sequences. 

Illustratively provided by the present invention are novel manufactured genes coding for consensus 
human leukocyte interferons. 
5 DNA subunit sequences for use in practice of the methods of the invention are preferably synthesized 
from nucleotide bases according to the methods disclosed in co-owned, concurrently-filed U.S. Patent 
Application Serial No. 375,483, by Yitzhak Stabinsky, entitled "Manufacture and Expression of Structural 
Genes" (Attorney's Docket No. 6250). Briefly summarized the general method comprises the steps of: 
(1) preparing two or more different, linear, duplex DNA strands, each duplex strand including a double 
70 stranded region of 12 or more selected complementary base pairs and further including a top single 
stranded terminal sequence of from 3 to 7 selected bases at one end of the strand and/or a bottom 
single stranded terminal sequence of from 3 to 7 selected bases at the other end of the strand, each 
single stranded terminal sequence of each duplex DNA strand comprising the entire base complement of 
at most one single stranded terminal sequence of any other duplex DNA strand prepared; and 
75 (2) annealing each duplex DNA strand prepared in step (1) to one or two different duplex strands 
prepared in step (1) having a complementary single stranded terminal sequence, thereby to form a 
single continuous double stranded DNA sequence which has a duplex region of at least 27 selected base 
pairs including at least 3 base pairs formed by complementary association of single stranded terminal 
sequences of duplex DNA strands prepared in step (1) and which has from 0 to 2 single stranded top or 
20 bottom terminal regions of from 3 to 7 bases. 

In the preferred general process for subunit manufacture, at least three different duplex DNA strands 
are prepared in step (1) and all strands so prepared are annealed concurrently in a single annealing 
reaction mixture to form a single continuous double stranded DNA sequence which has a duplex region of 
at least 42 selected base pairs including at least two non-adjacent sets of 3 or more base pairs formed by 
25 complementary association of single stranded terminal sequences of duplex strands prepared in step (1). 

The duplex DNA strand preparation step (1) of the preferred subunit manufacturing process preferably, 
comprises the steps of: 

(a) constructing first and second linear deoxyoligonucleotide segments having 15 or more bases in a 
selected linear sequence, the linear sequence of bases of the second segment comprising the total 

30 complement of the sequence of bases of the first segment except that at least one end of the second . 
segment shall either include an additional linear sequence of from 3 to 7 selected bases beyond those 
fully complementing the first segment, or shall lack a linear sequence of from 3 to 7 bases complemen- 
tary to a terminal sequence of the first segment, provided, however, that the second segment shall not 
have an additional sequence of bases or be lacking a sequence of bases at both of its ends; and, 

35 (b) combining the first and second segments under conditions conducive to complementary association 
between segments to form a linear, duplex DNA strand. 

The sequence of bases in the double stranded DNA subunit sequences formed preferably includes one 
or more triplet codons selected from among alternative codons specifying the same amino acid on the 
basis of preferential expression characteristics of the codon in a projected host microorganism, such as 

40 yeast cells or bacteria, especially E. coli bacteria. 

Also provided by the present invention are. improvements in methods and materials for enhancing levels 
of expression of selected exogenous genes in E. coli host cells. Briefly stated, expression vectors are 
constructed to include selected DNA sequences upstream of polypeptide coding regions which selected 
sequences are duplicative of ribosome binding site sequences extant in genomic E. coli DNA associated 

45 with highly expressed endogenous polypeptides. A presently preferred selected sequence is duplicative of 
the ribosome binding site sequence associated with E. coli expression of outer membrane protein F ("OMP- 
F"). 

Other aspects and advantages of the present invention will be apparent upon consideration of the 
following detailed description thereof. 

50 

DETAILED DESCRIPTION 

As employed herein, the term "manufactured" as applied to a DNA sequence or gene shall designate a 
product either totally chemically synthesized by assembly of nucleotide bases or derived from the biological 
55 replication of a product thus chemically synthesized. As such, the term is exclusive of products "syn- 
thesized" by cDNA methods or genomic cloning methodologies which involve starting materials which are 
of biological origin. Table l below sets out abbreviations employed herein to designate amino acids and 
includes lUPAC-recommended single letter designations. 
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TABLE I 

Amino Acid Abbreviation I UP AC Symbol 



Alanine 


Ala 


A 


Cysteine 


Cys 


C 


Aspartic acid 


Asp 


D 


Glutamic acid 


Glu 


E 


Phenylalanine 


Phe 


F 


Glycine 


Gly 


G 


Histidine 


His 


H 


Isoleucine 


He 


I 


Lysine 


Lys 


K 


Leucine 


Leu 


L 


Methionine 


Met 


M 


Asparagine 


Asn 


N 


Proline 


Pro 


? 


Glutamine 


Gin 


Q 


Arginine 


Arg 


R 


Serine 


Ser 


S 


Threonine 


Thr 


T 


Valine 


Val 


V 


Tryptophan 


Trp 


W 


Tyrosine 


Tyr 


V 



The following abbreviations shall be employed for nucleotide bases: A for adenine; G for guanine; T for 
thymine; U for uracil; and C for cytosine. For ease of understanding of the present invention, Table II and I) 
below provide tabular correlations between the 64 alternate triplet nucleotide base codons of DNA and the 
20 amino acids and transcription termination ("stop") functions specified thereby. In order to determine the 
corresponding correlations for RNA, U is substituted for T in the tables. 
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TABLE II 



FIRST 
POSITION 




SECOND 


POSITION 




THIRD 
POSITION 




T 


C 


A 


G 




T 


Phe 

rite 

Leu 
Leu 


Ser 

Cat 

w X 

Ser 
Ser 


Tyr 

Stop 
Stop 


Cys 
Cvs 
Stop 
Trp 


T 

c 

\ 
G 


C 


Leu 

LlCU 

Leu 
Leu 


Pro 

Pro 
Pro 


Bis 

£115 

Gin 
Gin 


Arg 

Arg 
Arg 


r 
A 
G 


A 


Tie 
He 
He 
Met 


Thr 
Thr 
Thr 
Thr 


Asn 
Asn 
Lys 
Lys 


Ser 
Ser 
Arg 
Arg 


T 
C 

A= 
G 


G 


Val 
Val 
Val 
val 


Ala 
Ala 
Ala 
Ala 


Asp 
Asp 
GlU 
Glu 


Gly 
Gly 
Gly 
Gly 


T 
C 
A 
G 
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TABLE III 



Amino Acid 



Specifying Codon(s) 





(A) 


Alanine 




GCC . 


GCA . 

WW* f 


GCG 

Www 


70 


(CI 


WJ 3 IC J. 11 C 


TCP 


X wW 








na^fli law cuiu 














blUIflUllC cClu 


G AA i 


GAG 






75 


\r) 


rnenyicicnine 


TTT , 








lG J 


G-iyc me 


GGT , 


Gel , 


GGA g 






(H) 


s is t i o me 


CAT , 


CAC 








(I) 


I soleuc ine 


TV mm 

ATT , 


ATC , 


ATA 




20 


(K) 


Lysine 


AAA, 


AAG 










Leuc ine 


TTA , 


TTG / 


CTT , 


CTC 




(M) 


Meth icnine 


ATG 








25 


(N) 


Asparagine 


AAT, 


AAC 








(P) 


Proline 


CCT, 


ccc, 


CCA, 


CCG 




(Q) 


Glutamine 


CAA, 


CAG 






30 


tR) 


Arginine 


CGT, 


CGC, 


CGA, 


CGG 




(S) 


Serine 


TCT, 


TCC, 


TCA, 


TCG 




(T) 


Threonine 


ACT / 


ACC, 


ACA, 


ACG 


35 


(V) 
(W) 


Valine 
Tryptophan 


GTT , 
TGG 


GTC 


GTA, 


GTG 




(Y) 


Tyrosine 


TAC, 


TAT 






40 


STOP 




TAA, 


TAG, 


TGA 





A "palindromic" recognition site for restriction endonuclease cleavage of double stranded DNA is one 
which displays "left-to-right and right-to-left" symmetry between top and bottom base complements, i.e., 

45 where "readings" of complementary base sequences of the recognition site from 5' to 3' ends are identical. 
Examples of palindromic six base recognition sites for restriction endonuclease cleavage include the sites 
for cleavage by Hindlll wherein top and bottom strands read from 5* to 3' as AAGCTT. A non-palindromic 
six base restriction she is exemplified by the site for cleavage by EcoP15, the top strand of which 
reportedly reads CAGCAG. The bottom strand base complement, when read 5' to 3* is CTGCTG. 

so Essentially by definition, restriction sites comprising odd numbers of bases (e.g., 5, 7) are non-palindromic. 
Certain endonucleases will cleave at variant forms of a site, which may be palindromic or not. For example, 
XhplI will recognize a site which reads (any purine)GATC(any pyrimidine) including the palindromic 
sequence AGATCT and the non-palindromic sequence GGATCT. Referring to the previously-noted "BRL 
Restriction Endonuclease Reference Chart," endonucleases recognizing six base palindromic sites exclu- 

55 sively include Bbrl, Chul, Hin173, Ein91R, Hinblll, Hinblll, Hindlll, Hinfli, Hsul, Bglll, Stul, Rrul, Clal, Avalll, 
Pvull, Smal, Xmal, Eccl, Sacll, Sbol, Sbrl, Shyl, Sstll, Tgll, Avrll, Pvul, Rshl, Rspl, Xnil, Xorll, Xmalll, Blul, 
Msil, Scul, Sexl, Sgol, Slal, SIul, Spal, Xhol, Xpal, Bce170, Bsu1247, Pstl, SalPI, Xmall, Xorl, EcoRI, 
Rsh630I, Sad, Sstl, Sphl, BamHI, BamKI, BamNI, BamFI, BstI, Kpnl, Sail, Xaml, Hpal, Xbal, AtuCI, Bell, 
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Cpel, SstIV, Aosl, Mstl, Ball, Asull, and Mlal. Endonucleases which recognize only non-palindromic six base 
sequences exclusively include Tthlllll, EcoP15, Aval, and Avrl. Endonucleases recognizing both palindromic 
and non-palindromic six base sequences include Hael, HgiAl, Acyl, Aosl!, Asulll, Accl, Chull, Hindi, Hindll, 
Mnnl, Xholl, Haell, HinHI, Ngol, and EcoRI'. 

5 Upon determination of the structure of a desired polypeptide to be produced, practice of the present 
invention involves: preparation of two or more different specific, continuous double stranded DNA subunit 
sequences of 100 or more base pairs in length and having terminal portions of the proper configuration; 
serial insertion of subunits into a selected assembly vector with intermediate amplification of the hybrid 
vectors in a selected host organism; use of the assembly vector (or an alternate, selected "expression" 

70 vector including the DNA sequence which has been manufactured from the subunits) to transform a 
suitable, selected host; and, isolating polypeptide sequences expressed in the host organism. In its most 
efficient forms, practice of the invention involves using the same vector for assembly of the manufactured 
sequence and for large scale expression of the polypeptide. Similarly, the host microorganism employed for 
expression will ordinarily be the same as employed for amplifications performed during the subunit 

75 assembly process. 

The manufactured DNA sequence may be provided with a promoter/regulator region for autonomous 
control of expression or may be incorporated into a vector in a manner providing for control of expression 
by a promoter/regulator sequence extant in the vector. Manufactured DNA sequences of the invention may 
suitably be incorporated into existing plasmid-borne genes (e.g., jS-galactosidase) to form fusion genes 

20 coding for fusion polypeptide products including the desired amino acid sequences coded for by the 
manufactured DNA sequences. 

In practice of the invention in its preferred forms, polypeptides produced may vary in size from about 
65 or 70 amino acids up to about 200 or more amino acids. High levels of expression of the desired 
polypeptide by selected transformed host organisms is facilitated through the manufacture of DNA 

25 sequences which include one or more alternative codons which are preferentially expressed by the host. 

Manufacture of double stranded subunit DNA sequences of 100 to 200 base pairs in length may 
proceed according to prior art assembly methods previously referred to, but is preferably accomplished by 
means of the rapid and efficient procedures disclosed in the aforementioned U.S. Application S.N. 375,493 
by Stabinsky and used in certain of the following examples of actual practice of the present invention. 

30 Briefly put, these procedures involve the assembly from deoxyoligonucleotides of two or more different, 
linear, duplex DNA strands each including a relatively long double stranded region along with a relatively 
short single stranded region on one or both opposing ends of the double strand. The double stranded 
regions are designed to include codons needed to specify assembly of an initial, or terminal or intermediate 
portion of the total amino acid sequence of the desired polypeptide. Where possible, alternative codons 

35 preferentially expressed by a projected host (e.g., E. colt ) are employed. Depending on the relative position 
to be assumed in the finally assembled subunit DNA sequence, the single stranded region(s) of the duplex 
strands will include a sequence of bases which, when complemented by bases of other duplex strands, also 
provide codons specifying amino acids within the desired polypeptide sequence. 

Duplex strands formed according to this procedure are then enzymatically annealed to the one or two 

40 different duplex strands having complementary short, single stranded regions to form a desired continuous 
double stranded subunit DNA sequence which codes for the desired polypeptide fragment. 

High efficiencies and rapidity in total sequence assembly are augmented in such procedures by 
performing a single annealing reaction involving three or more duplex strands, the short, single stranded 
regions of which constitute the base complement of at most one other single stranded region of any other 

45 duplex strand. Providing all duplex strands formed with short single stranded regions which uniquely 
complement only one of the single stranded regions of any other duplex is accomplished by alternative 
codon selection within the context of genetic code redundancy, and preferably also in the context of codon 
preferences of the projected host organism. 

The following description of the manufacture of a hypothetical long DNA sequence coding for a 

so hypothetical polypeptide will serve to graphically illustrate practice of the invention, especially in the context 
of formation of proper terminal sequences on subunit DNA sequences. 

A biologically active polypeptide of interest is isolated and its amino acids are sequenced to reveal a 
constitution of 100 amino acid residues in a given continuous sequence. Formation of a manufactured gene 
for microbial expression of the polypeptide will thus require assembly of at least 300 base pairs for insertion 

55 into a selected viral or circular plasmid DNA vector to be used for transformation of a selected host 
organism. 

A preliminary consideration in construction of the manufactured gene is the identity of the projected 
microbial host, because foreknowledge of the host allows for codon selection in the context of codon 
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preferences of the host species. For purposes of this discussion, the selection of an E. coli bacterial host is 
posited. 

A second consideration in construction of the manufactured gene is the identity of the projected DNA 
vector employed in the assembly process. Selection of a suitable vector is based on existing knowledge of 

5 sites for cleavage of the vector by restriction endonuclease enzymes. More particularly, the assembly 
vector is selected on the basis of including DNA sequences providing endonuclease cleavage sites which 
will permit easy insertion of the subunits. In this regard, the assembly vector selected preferably has at 
least two restriction sites which occur only once (i.e., are "unique") in the vector prior to performance of any 
subunit insertion processes. For the purposes of this description, the selection of a hypothetical circular 

70 DNA plasmid pBR 3000 having a single EcoRI restriction site, i.e., 



-GAATTC- , 
-CTTAAG- 

75 

and a single Pvull restriction site, i.e., 

-CAGCTG- , 



is posited. 

The amino acid sequence of the desired polypeptide is then analyzed in the context of determining 
availability of alternate codons for given amino acids (preferably in the context of codon preferences of the 

25 projected E. coli host). With this information in hand, two subunit DNA sequences are designed, preferably 
having a length on the order of about 150 base pairs - each coding for approximately one-half of the total - 
amino acid sequences of the desired polypeptide. For purposes of this description, the two subunits 
manufactured will be referred to as "A" and n B n . 

The methods of the present invention as applied to two such subunits, generally call for: insertion of 

30 one of the subunits into the assembly vector; amplification of the hybrid vector formed; and insertion of the 
second subunit to form a second hybrid including the assembled subunits in the proper sequence. Because 
the method involves joining the two subunits together jn a manner permitting the joined ends to provide a 
continuous preselected sequence of bases coding for a continuous preselected sequence of amino acids, 
there exist certain requirements concerning the identity and sequence of the bases which make up the 

35 terminal regions of the manufactured subunits which will be joined to another subunit. Because the method 
calls for joining subunits to the assembly vector, there exist other requirements concerning the identity and 
sequence of the bases which make up those terminal regions of the manufactured subunits which will be 
joined to the assembly vector. Because the subunits are serially, rather than concurrently, inserted into the 
assembly vector (and because the methods are most beneficially practiced when the subunits can be 

40 selectively excised from assembled form to allow for alterations in selected base sequences therein), still 
further requirements exist concerning the identity of the bases in terminal regions of subunits manufactured. 
For ease of understanding in the following discussion of terminal region characteristics, the opposing 
terminal regions of subunits A and B are respectively referred to as A-1 and A-2, and B-f and B-2, viz: 



45 



B-2 



B-l 



A-2 



A-1 



B 



Assume that an assembly strategy is developed wherein subunit A is to be inserted into pBR3000 first, 
with terminal region A-1 to be ligated to the vector at the EcoRI restriction site. In the simplest case, the 
terminal region is simply provided with an EcoRI "sticky end", i.e., a single strand of four bases (-AATT- or 
-TTAA-) which will complement a single stranded sequence formed upon EcoRI digestion of pBR3000. This 
55 will allow ligation of terminal region A-1 to the vector upon treatment with ligase enzyme. Unless the single 
strand at the end of terminal region A-1 is preceded by an appropriate base pair 
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( € -9" 3'.CTTAA- } ' 

5 the entire recognition site will not be reconstituted upon ligation to the vector. Whether or not the EcoRI 
recognition site is reconstituted upon ligation (i.e., whether or not there will be 0 or 1 EcoRI sites remaining 
after insertion of subunit A into the vector) is at the option of the designer of the strategy. Alternatively, one 
may construct the terminal region A-1 of subunit A to include a complete set of base pairs providing a 
recognition site for some other endonuclease, hypothetically designated "XXX", and then add on portions of 

70 the EcoRI recognition site as above to provide an EcoRI "linker". To be of practical use in excising subunit 
A from an assembled sequence, the "XXX" site should not appear elsewhere in the hybrid piasmid formed 
upon insertion. The requirement for construction of terminal region A-1 is, therefore, that it comprise a 
portion (i.e., all or part) of a base sequence which provides a recognition site for cleavage by a restriction 
endonuclease, which recognition site is entirely present either once or not at ail in the assembly vector 

75 upon insertion of the subunit. 

Assume that terminal region B-2 of subunit B is also to be joined to the assembly vector (e.g., at the 
single recognition site for Pvull cleavage present on pBR3000). The requirements for construction of 
terminal region B-2 are the same as for construction of A-1, except that the second endonuclease enzyme 
in reference to which the construction of B-2 is made must be different from that with respect to which the 

20 construction of A-1 is made, if recognition sites are the same, one will not be able to separately excise 
segments A and B from the fully assembled sequence. 

The above assumptions require, then, that terminal region A-2 is to be ligated to terminal region B-1 in 
the final pBR3000 hybrid. Either the terminal region A-2 or the terminal region B-1 is constructed to 
comprise a portion of a (preferably palindromic six base) recognition site for restriction endonuclease 

25 cleavage by hypothetical third endonuclease "YYY" which recognition site will be entirely present once and 
only once in the expression vector upon insertion of all subunits thereinto, i.e., at an intermediate position in. 
the assemblage of subunits. There exist a number of strategies for obtaining this result. In one alternative 
strategy, the entire recognition site of "YYY" is contained in terminal region A-2 and the region additionally 
includes the one or more portions of other recognition sites for endonuclease cleavage needed to (1) 

30 complete the insertion of subunit A into the assembly vector for amplification purposes, and (2) allow for 
subsequent joining of subunit A to subunit B. In this case, terminal region B-1 would have at its end only 
the bases necessary to link it to terminal region A-2. In another alternative, the entire "YYY" recognition site 
is included in terminal region B-1 and B-1 further includes at its end a portion of a recognition site for 
endonuclease cleavage which is useful for joining subunit A to subunit B. 

35 As another alternative, terminal region B-1 may contain at its end a portion of the "YYY" recognition 
site. Terminal region A-2 would then contain the entire "YYY" recognition site plus, at its end, a suitable 
"linker" for joining A-2 to the assembly vector prior to amplification of subunit A (e.g., a Pvull "sticky end"). 
After amplification of the hybrid containing subunit A, the hybrid would be cleaved with "YYY" (leaving a 
sticky-ended portion of the "YYY" recognition site exposed on the end of A-2) and subunit B could be 

40 inserted with its B-1 terminal region joined with the end of terminal region A-2 to reconstitute the entire 
"YYY" recognition site. The requirement for construction of the terminal regions of all segments (other than 
A-1 and B-2) is that one or the other or both (i.e., "at least half") comprise a portion (i.e., include all or part) 
of a recognition site for third restriction endonuclease cleavage, which recognition site is entirely present 
once and only once (i.e., is "unique") in said assembly vector after insertion of all subunits thereinto. To 

45 generate a member of the class of novel DNA sequences of the invention, the recognition site of the third 
endonuclease should be a six base palindromic recognition site. 

While a subunit "terminal region" as referred to above could be considered to extend from the subunit 
end fully halfway along the subunit to its center, as a practical matter the constructions noted would 
ordinarily be performed in the final 10 or 20 bases. Similarly, while the unique "intermediate" recognition 

so site in the two subunit assemblage may be up to three times closer to one end of the manufactured 
sequence than it is to the other, it will ordinarily be located near the center of the sequence. If, in the above 
description, a synthetic plan was generated calling for preparation of three subunits to be joined, the 
manufactured gene would include two unique restriction enzyme cleavage sites in intermediate positions at 
least one of which will have a palindromic six base recognition site in the class of new DNA sequences of 

55 the invention. 

The significant advantages of the above-described process are manifest. Because the manufactured 
gene now includes one or more unique restriction endonuclease cleavage sites at intermediate positions 
along its length, modifications in the codon sequence of the two subunits joined at the cleavage site may be 
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effected with great facility and without the need to re-synthesize the entire manufactured gene. 

Following are illustrative examples of the actual practice of the invention in formation of manufactured 
genes capable of directing the synthesis of: human leukocyte interferon of the F subtype (INF-aF) and 
analogs thereof; and, multiple consensus leukocyte interferons which, due to homology to IFN-aF can be 
5 named as IFN-aF analogs. It will be apparent from these examples that the gene manufacturing method- 
ology of the present invention provides an overall synthetic strategy for the truly rapid, efficient synthesis 
and expression of genes of a length in excess of 200 base pairs within a highly flexible framework allowing 
for variations in the structures of products to be expressed which has not heretofore been available to 
investigators practicing recombinant DNA techniques. 

70 

EXAMPLE 1 

The amino acid sequences for the human leukocyte interferon of the F subtype has been deduced by 
way of sequencing of cDNA clones. See, e.g., Goedell, et al., Nature , 200 , pp. 20-26 (1981). The general 
75 procedures of prior Examples 1, 2 and 3 were employed in the design and assembly of a manufactured 
DNA sequence for use in microbial expression of IFN-aF in E. coli by means of a pBR322-derived 
expression vector. A general plan for the construction of three "major" subunit DNA sequences (LeulFN-F I, 
LeulFN-F II and LeulFN-F 111) and one "minor" subunit DNA sequence (LeulFN-F IV) was evolved and is 
shown in Table IV below. 
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As in the case of the gene manufacture strategy set out in Table IV, the strategy of Table VII involves 
55 use of bacterial preference codons wherever it is not inconsistent with deoxyribonucleotide segment 
constructions. Construction of an expression vector with the subunits was similar to that involved with the 
IFN 7 -specifying gene, with minor differences in restriction enzymes employed. Subunit I is ligated into 
pBR322 cut with EcoRI and Sail. (Note that the subunit terminal portion includes a single stranded Sal! 
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"sticky end" but, upon complementation, a Sail recognition site is not reconstituted. A full BamHI 
recognition site remains, however, allowing for subsequent excision of the subunit.) This first intermediate 
plasmid is amplified and subunit II is inserted into the amplified plasmid after again cutting with EcoRI and 
Sail. The second intermediate plasmid thus formed is amplified and subunit III is inserted into the amplified 

5 plasmid cut with EcoRI and Hindlll. The third intermediate plasmid thus formed is amplified. Subunit IV is 
ligated to an EcoRI and Xbal fragment isolated from p!NT 7 -TXb4 of Example 4 and this ligation product 
(having EcoRI and BstEII sticky ends) is then inserted into the amplified third intermediate plasmid cut with 
EcoRI and BstEII to yield the final expression vector. 

The isolated product of trp promoter/operator controlled E.coli expression of the manufactured DNA 

70 sequence of Table IV as inserted into the final expression vector was designated IFN-aFi . 

EXAMPLE 2 

As discussed infra with respect to consensus leukocyte interferon, those human leukocyte interferon 
75 subtypes having a threonine residue at position 14 and a methionine residue at position 16 are reputed to 
display greater antiviral activity than those subtypes possessing Ala 14 and He 16 residues. An analog of 
human leukocyte interferon subtype F was therefore manufactured by means of microbial expression of a 
DNA sequence of Example 1 which had been altered to specify threonine and methionine as residues 14 
and 16, respectively. More specifically, [Thr u , Met 16 ] IFN-aF, designated IFN-oF 2 , was expressed in E.coli 
20 upon transformation with a vector of Example 1 which had been cut with Sail and Hindlll and into which a 
modified subunit II (of Table VII) was inserted. The specific modifications of subunit il involved assembly 
with segment 39 altered to replace the alanine-specifying codon, GCT, with a threonine-specifying ACT 
codon and replace the isoleucine-specifying codon, ATT, with an ATG codon. Corresponding changes in 
complementary bases were made in section 40 of subunit LeuIFN-FII. 
25 The following Examples 2 and 3 relate to practice of the invention in the microbial synthesis of 
consensus human leukocyte interferon polypeptides which can be designated as analogs of human 
leukocyte interferon subtype F. 

EXAMPLE 3 

30 

"Consensus human leukocyte interferon" ("IFN-Con," "LeulFN-Con") as employed herein shall mean a 
non-naturally-occurring polypeptide which predominantly includes those amino acid residues which are 
common to all naturally-occurring human leukocyte interferon subtype sequences and which includes, at 
one or more of those positions wherein there is no amino acid common to all subtypes, an amino acid 

35 which predominantly occurs at that position and in no event includes any amino acid residue which is not 
extant in that position in at least one naturally-occurring subtype. (For purposes of this definition, subtype A 
is posrtionally aligned with other subtypes and thus reveals a "missing" amino acid at position 44.) As so 
defined, a consensus human leukocyte interferon will ordinarily include all known common amino acid 
residues of all subtypes. It will be understood that the state of knowledge concerning naturally-occurring 

40 subtype sequences is continuously developing. New subtypes may be discovered which may destroy the 
"commonality" of a particular residue at a particular position. Polypeptides whose structures are predicted 
on the basis of a later-amended determination of commonality at one or more positions would remain within 
the definition because they would nonetheless predominantly include common amino acids and because 
those amino acids no longer held to be common would nonetheless quite likely represent the predominant 

<5 amino acid at the given positions. Failure of a polypeptide to include either a common or predominant 
amino acid at any given position would not remove the molecule from the definition so long as the residue 
at the position occurred in at least one subtype. Polypeptides lacking one or more internal or terminal 
residues of consensus human leukocyte interferon or including internal or terminal residues having no 
counterpart in any subtype would be considered anaiogs of human consensus ieukocyte interferon. 

so Published predicted amino acid sequences for eight cDNA-derived human leukocyte interferon sub- 
types were analyzed in the context of the identities of amino acids within the sequence of 166 residues. 
See, generally, Goedell, et ah, Nature , 290 , pp. 20-26 (1981) comparing LelFN-A through LeIFN-H and 
noting that only 79 amino acids appear in identical positions in all eight interferon forms and 99 amino acids 
appear in identical positions if the E subtype (deduced from a cDNA pseudogene) was ignored. Each of the 

55 remaining positions was analyzed for the relative frequency of occurrence of a given amino acid and, where 
a given amino acid appeared at the same position in at least five of the eight forms, it was designated as 
the predominant amino acid for that position. A "consensus" polypeptide sequence of 166 amino acids was 
plotted out and compared back to the eight individual sequences, resulting in the determination that LelFN- 
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F required few modifications from its "naturally-occurring n form to comply with the consensus sequence. 

A program for construction of a manufactured IFN-Con DNA sequence was developed and is set out 
below in Table V. In the table, an asterisk designates the variations in IFN-aF needed to develop LelFN- 
Conn, i.e., to develop the [Arg 22 , Ala 76 , Asp 78 , Glu 79 , Tyr 86 , Tyr 90 , Leu 96 , Thr 1ES , Asn 157 , Leu 158 ] analog of 
IFN-aF. The illustrated top strand sequence includes, wherever possible, codons noted to the subject of 
preferential expression in E. coli. The sequence also includes bases providing recognition sites for Sal, 
Hindlll, and BstE2 at positions intermediate the sequence and for XBal and BamHI at its ends. The latter 
sites are selected for use in incorporation of the sequence in a pBR322 vector, as was the case with the 
sequence developed for IFN-aF and its analogs. 
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TABLE V 

-1 . 1 10 
Met-Cys-Asp-Leu-Pro^ln-Thr-Bis-Ser-Leu-Gly-Asn-Arg-Arg- 
ATG TGT GAT TTA CCT CAA ACT CAT TCT CTT GGT AAC CGT CGC 

20 * 

Ala-Leu-Ile-Leu-Leu-Ala-Gln-Met-Arg-Arg-Ile-Ser-Pro-Phe- 
GCT CTG ATT CTG CTG GCA CAG ATG CGT CGT ATT TCC CCG TTT 

30 40 
Ser-Cys-Leu-Lys-Asp-Arg-Eis-Asp-Phe-Gly-Phe-Pro-Gln-Glu- 
AGC TGC CTG AAA GAC CGT CAC OAC TTC GGC TTT CCG CAA GAA 

50 

Glu-Phe-Asp-Gly-Asn-Gln-Phe^ln-Lys-Ala-Gln-Ala-Ile-Ser- 
GAG TTC GAT GGC AAC CAA TTC CAG AAA GCT CAG GCA ATC TCT 

€0 

Val-Leu-His-Glu-Met-Ile-Gln-Gln-Thr-Phe-Asn-Leu-Phe-Serr- 
GTA CTG CAC GAA ATG ATC CAA CAG ACC TTC AAC CTG TTT TCC 

70 * * * 80 

Thr-Lys-Asp-Ser-Ser-Ala-Ala-Trp-Asp^lu-Ser-Leu-Leu-Glu- 
ACT AAA GAC AGC TCT GCT GCT TGG GAC GAA AGC TTG CTG GAG 

* *90 * 

Lys-Phe-Tyr-Thr-Glu-Leu-Tyr-Gln-Gln-Leu-Asn-Asp-Leu-Glu- 
AAG TTC TAC ACT GAA CTG TAT CAG CAG CTG AAC GAC CTG GAA 

100 110 
Ala-Cys-Val-Ile-Gln-Glu-Val-Glv-Val-Gla-Glu-Thr-Pro-Leu- 
GCA TGC GTA ATC CAG GAA GTT GGT GTA GAA GAG ACT CCG CTG 

120 

Met-Asn-Val-Asp-Ser-Ile-Leu-Ala-Val-Lys-Lys-Tyr-Phe-Gln- 
ATG AAC GTC GAC TCT ATT CTG GCA GTT AAA AAG TAC TTC. CAG 

130 

Arg-Ile-Thr-Leu-Tyr-Leu-Thr-Glu-Lys-Lys-Tyr-Ser-Pro-Cys- 
CGT ATC ACT CTG TAC CTG ACC GAA AAG AAA TAT TCT CCG TGC 

140 150 
Ala-Trp-Glu-Vfil-Val-Arg-Ala-Glu-21e-K€t-Arg-Sei-?he-Ser- 
GCT TGG GAA GTA GTT CGC GCT GAA ATT ATG CGT TCT TTC TCT 

* * * 160 166 Stop 
Leu-Ser-Thr-Asn-Leu-Gln^lu-Arc-Leu-Arg-Arg-Lys-Glu 

CTG TCT ACT AAC CTG CAG GAG CGT CTG CGC CGT AAA GAA TAA 

Stop 
TAG 



Table VI below sets out the specific double stranded DNA sequence for preparation 4 subunit DNA 
sequences for use in manufacture of IFN-Coni . Subunit LeulFN-Con IV is a duplicate of LeulFN-F IV of 
Table V. Segments of subunits which differ from those employed to construct the IFN-aF gene are 
designated with a "prime" (e.g., 37' and 38 f are altered forms of sections 37 and 38 needed to provide 
arginine rather than glycine at position 22). 



23 



EP 0 422 697 E1 



a 



70 



75 



20 



H 

> 

a 
< 



25 



30 



35 



C 

O 

u 
z 



3 

a 
u 

CD 0) 

to 

09 



a 
o 

a 

< 
CO 

u 



Hf-< 
< H 



, o a -c- 

m < H 



40 



3 



45 



50 



55 



24 



EP 0 422 687 B1 



70 



75 



20 



25 



30 



35 



40 



45 



50 



C 

o 
u 

z; 
t- 

H< 
3 

o 
J 



ft) 






< 




Cu 


X 






< 










H 


< 




C 












rH 


o 












u 




U 


0 






a 




CJ 






c 














u 




u 


u 




< 


01 












to 






< 
















»H 


o 






< 




O 








< 










< 




















L/ * 






< 












u 










U 






At 
W 




















< CO 


r, 
















U 


O 
















_J 
^1 






a 






ft I 




< 










< 




















O k- 






cj 




w 


fNJ 1 




< 








f 1 




a 


0 




w w 












■ 


«D 




< 


H 




f ^ 

W 






u 








< 






u 




O 












W 






e? 


u 




Cl, 


<y 






< 










u 


0 


















i ^ 






u 












< 






♦J 




u 


























< 




r *i 








< 










< 






0) 


























CJ 




u 


< 






. 




b 


























< 










u 












■ 0 


u 












0 




W 1 








MM 

*~" 








u 










0 








m 














< 












u 










cj 


0 
















to 


C 








»— ( 


< 






< 




»-« 




< 




< 
















fc2 


>l 


0 >, 






< 


<i-> 










u 


03 










0 


O 


O 3 












a) 






< 


Cm 










H < 












< 




10 






< 




>. 


>— I 




u 0 




U 


c: 












0 










Ui 


u 




< 






QJ 






< 






W 



< H 

U O 

a o 

< H 

< H 

o u 

H < 

< H 

o u 

H < 

H < 

0 u 

< H 



< 5-* 



< H 
U O 

o u 
a u 
u o 

H < 
H < 
H < 

o u 
a v 

r- < 

< H 
O U 

u o 

< r- 

5- < 

U U 

a u 



m < «-i 
n < H o 



H < 

U O 

< H 



o 
10 

o 

X 
Cl 

•J 
c 
< 

o 

X 
Cb 

Li 
X 

c 
c 

iH 



3 

o 

n 



> 



f0 

< 
c 

O 

< 

o « 
c 



u 0 
u o 

H < 

H < 
H < 

O U 
H < 

u a 
u o 

< 

< H 

U O 
H < 
H < 

u o 

< H 

o u 

< H 
u ^? 

< H 

< H 

U O 

u u 
H < 

< H 

O U o 
H < 
a% < H 

< 

< H 

U O 

< H 

u u 

r- < 

U O 

< H 

c u 



r- < 

u c 
u u 

< H 

< H 

o u 

o U n 

< fc- n 

u o 



r- < 

U O 

o u 

< 5- 

< =- 

< 5- 

o u 

< H 



C 



u 



C% 3 


< H 




< H 




0 a 


a 


u 0 


05 


< h 


< 


0 u - 




VD 


a 


CJ <N 


w 




r- 


r- < 


c 


£- < 


w m cj O 


< rs 


3 0 U 


to 


H < 




CJ 0 


< 


0 u 




H < 


c 


U U 


w 


H < 






a 


O CJ 


w 


< H 


C- 


u a 0 


t: 


< H rg 


< 


a u 


w r 


- < H 


> pw < ^ 




< H 


0 u 


r- < 


r- x 


a 0 




< H 



55 



25 



EP 0 422 697 B1 



70 



75 



20 



25 



30 



35 



40 



C 

O 

u 
z 

3 

c 



c 

O 

c 

O 
O w 

o 

3 
rH 

o 



u 

o 
G- 

LI 



3 
t-i 
O 

3 

a 



O 

to 



e: 
O 
u 



< K. 
u o 



< 

u 

< 



< 

< 



H < 

U O 

o u <^ 

< H <N 



^ u a 

«N < H 



< H 
H < 

CJ O 

H < 

H < 

O U 

< H 

< H 

O O 

O U 

u u 

£- < 

O C3 



3 

u 

O 
«H 

to 

> 
>i 

> 
o 



O u ^ 

£- < ™ m 



u 
a 

m < 

PsI 

< 

u 
o 
o 
< 
u 

< 
< 



u 

u 
u 



c 



O »— I 
O IB 
•H > 

10 

U 



< 

O 

3 

0) 

a 

< 

c 
to 
< 

3 

a 
•J 



< f-» 

<: h 

H < 
U U 

H < 

a u 

PS 



< 
< 



< £- 

u o 

< 5* 

< H 
H <_ 

C U 

o u 

h* < 

< H 

< H - 

C5 U o 

O U 
S- < 



en a e> 

< 5- 

< H 

O U 
H < 



«5 



e-^ C 



U 
vs 

O U *h 
< ^ 



O r-. r- 



O C 



m o U 

o u 

r- < 

a. a 
< h 



45 



50 



55 



26 



EP 0 422 6S7 B1 



70 



75 



20 



25 



30 



35 



40 



45 



50 



C 

o 
u 



3 



3 
ft) 
iJ 

O u 

3 
0) 
iJ 

X 



< 



&> 

to 
>* 

CO 

o ^ 

•H > 

ro 
•—I 
< 

3 
O 
J 



w 

V) 

< 



> 



O U 
H < 
U O 

u c" 

< H 
H < 

O U 
»H < 
U O 

H < 

u o 
u o 

H < rs 

< 

O U 
U O 

o a 

H < 
H < 

u e> 

H < 
< 

< H 
_< H 

V < 
H < 

U U 

< H 

u o 
o u 



< 
< 



o 
u 



< 






< 


u 






< 






u 




< 




a 


U 


u 


15 


t- 


< 


o u 


o o 


a u 


U) u 


< H 


U O 


H 




H 




< 




< 





0) 


u o 




H < 


Pi 


*- < 


Li 


H < vo 




U 15 


CO 


fc- < 


o 


H< 


in ui m o U 


•H < 


U O 


jj 


a u 


ft) 


fc- < 




< H 


%u 


r* 


iH 


H < 


M 


< H 






3 










c u 


m 


H < 






< 


u a 




u o 




o o 


< 


u a 








h < 


> 


o u 




< H 


m 


H < 


> 


O U CD 


3 


<: 


f-i 


< h 


O 


o u 


a 


C5 U 


u o u 




H < 


o fC 


H < 






< 


C U 


to 






O U 


U 


H < 


0 


o u 


u 


u c 


c- 








w 


e- < 






10 


H < 


u 


H < 


>» 


< H 




H < 


to 


< H 




< H 


•J 


< H 




O 


w 


O U 


>« 


< f- 


•J 


< H 


3 


< H 


fH 




r *\ 
W 


^ C U 






X 




H 





C 

o 



E 
10 
Q 

jj 
w 

a 

\D 3 

n 
>i 

< 
u 

< 

3 

a 
iJ 

D> 
w 
< 

O 3 

c 

o 

3 

a 

C 

w 

< 

u 

•x 



3 



U 

a. 



c u 

< H N 

< H 
E-« < 

< H 

< H 
O U 

< 5- 

Si: 

o u 
u c 

o o 

u o 

H < 

u o 

< 

o u 
u a 

c u 



V 

to 



< H 

H < 

U CL 
< 

u o 
c u 

< r- 

U 
r- < 

f ^ f^ 

W W 

f- < 

u o 

H < 



The four subunits of Table VI were sequentially inserted into an expression vector according to the 
55 procedure of Example 1 to yield a vector having the coding region of Table V under control of a trp 
promoter/operator. The product of expression of this vector in E.coli was designated IFN-Coni . It will be 
noted that this polypeptide includes all common residues indicated in Goedeil, et a!., supra , and, with the 
exception of Ser 80 , Glu B3 , Val 114 , and Lys 121 , included the predominant amino acid indicated by analysis of 
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the reference's summary of sequences. The four above-noted residues were retained from the native IFN- 
aF sequence to facilitate construction of subunits and assembly of subunits into an expression vector. 
(Note, e.g., serine was retained at position 80 to allow for construction of a Hindlll site.) 

Since publication of the Goedell, et al. summary of IFN-a subtypes, a number of additional subtypes 

5 have been ascertained. Figure 1 sets out in tabular form the deduced sequences of the 13 presently known 
subtypes (exclusive of those revealed by five known cDNA pseudogenes) with designations of the same 
IFN-a subtypes from different laboratories indicated parenthetically (e.g., IFN-a6 and IFN-aK). See, e.g., 
Goedell, et al., supra ; Stebbing, et al., in: Recombinant DNA Products, Insulin, Interferons and Growth 
Hormones (A. Bollon, ed.), CRC Press (1983); and Weissman, et al.. U.C.LA Symp.Mol.Cell Biol. , 25, pp. 

70 295-326 (1982). Positions where there is no common amino acid are shown in bold face. IFN-a subtypes 
are roughly grouped on the basis of amino acid residues. In seven positions (14, 16, 71, 78, 79, 83, and 
1 60) the various subtypes show just two alternative amino acids, allowing classification of the subtypes into 
two subgroups (I and II). based on which of the seven positions are occupied by the same amino acid 
residues. Three IFN-a subtypes (H, F and B) cannot be classified as Group I or Group II and, in terms of 

75 distinguishing positions, they appear to be natural hybrids of both group subtypes. It has been reported that 
IFN-a subtypes of the Group I type display relatively high antiviral activity while those of Group I! display 
relatively high antitumor activity. 

IFN-Coni structure is described in the final line of the Figure. It is noteworthy that certain residues of 
lFN-Coni (e.g., serine at position 8) which were determined to be "common" on the basis of the Goedell, et 

20 al., sequences are new seen to be "predominant". Further, certain of the iFN-Coni residues determined to 
be predominant on the basis of the reference (Arg 22 , Asp 78 , Glu 79 , and Tyr 86 ) are no longer so on the basis 
of updated information, while certain heretofore non-predominant others (Ser 80 and Glu 83 ) now can be 
determined to be predominant. 

25 EXAMPLE 4 

A human consensus leukocyte interferon which differed from IFN-Coni in terms of the identity of amino 
acid residues at positions 14 and 16 was prepared by modification of the DNA sequence coding for IFN- 
Coni . More specifically, the expression vector for IFN-Coni was treated with BstEll and Hind 111 to delete 
subunit LeuIFN Con III. A modified subunit was inserted wherein the alanine-specifying codon, GCT, of 
sections 39 and 40 was altered to a threonine-specifying codon, ACT, and the isoleucine codon, CTG, was 
changed to ATG. The product of expression of the modified manufactured gene, [Thr 14 , Met 16 , Arg 22 , Ala 76 , 
Asp 78 , Glu 79 , Tyr 86 , Tyr 90 , Leu 96 , Thr 156 , Asn 157 , Leu 158 ]lFN-aF, was designated IFN-Con 2 . 

Presently being constructed is a gene for a consensus human leukocyte interferon polypeptide which 
will differ from IFN-Coni in terms of the identity of residues at positions 114 and 121. More specifically, the 
Val 114 and Lys 121 residues which duplicate IFN-a F subtype residues but are not predominant amino acids 
will be changed to the predominant Glu 11 * and Arg 121 residues, respectively. Because the codon change 
from Val 114 to Arg 114 (e.g., GTC to GAA) will no longer allow for a Sail site at the terminal portion of subunit 
LeuIFN Con I (of Table VI), subunits I and II will likely need to be constructed as a single subunit. Changing 
the AAA, lysine, codon of sections 11 and 12 to CTG will allow for the presence of arginine at position 121. 
The product of microbial expression of the manufactured gene, [Arg 22 , Ala 76 , Asp 78 , Glu 79 , Tyr 86 , Tyr 90 , 
Leu 96 , Glu 114 , Arg 121 , Thr 156 , Asn 157 , Leu 158 ] IFN-aF, will be designated IFN-Con 3 . 

The following example relates to antiviral activity screening of human leukocyte interferon and 
polypeptides provided by the preceding examples. 

EXAMPLE 5 

Table Vll below provides the results of testing of antiviral activity in various cell lines of natural (buffy 
coat) interferon and isolated, microbially-expressed, polypeptides designated iFN-aFi, iFN-aF2, iFN-Coni, 
and IFN-Con2. Viruses used were VSV (vesicular stomatitis virus) and EMCV (encephalomyocarditis virus). 
Cell lines were from various mammalian sources, including human (WISH, HeLa), bovine (MDBK), mouse 
(MLV-6), and monkey (Vero). Antiviral activity was determined by an end-point cytopathic effect assay as 
described in Week, et al., J.Gen.Virol. , 57, pp. 233-237 (1981) and Campbell, et al., Can.J.Microbiol. , 21^, pp. 
1247-1253 (1975). Data shown was normalized for antiviral activity in WISH cells. 
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TABLE VII 





Cell 


Buff y 


IFN- 


IFN- 


IFN- 


IFN- 


Virus 


Line 


Coat 




aF 


Con. 

JL 


Con- 
z 


vsv 


WISH 


100 


100 


100 


-•- 100 


100 


vsv 


13 CllO 


H U U 


luu 


wy 


I uu 


i nn ■ 
1 00 


vsv 


MDBK 


1600 


33 


ND 


200 


300 


vsv 


MLV-6 


20 


5 


KD 


3 


20 


vsv 


Vero 


10 


0.1 


ND 


10 


o.; 


EMCV 


WISH 


100 


100 


100 


100 


100 


EMCV 


EeLa 


100 


5 


ND 


33 


33 


EMCV 


Vero 


100 


20 


ND 


looo' 


10 


*ND - 


no data 


presently 


available. 







It will be apparent from the above examples that the present invention provides, for the first time, an 
25 entire new genus of synthesized, biologically active proteinaceous products which products differ from 
naturally-occurring forms in terms of the identity and/or location of one or more amino acids and in terms of. 
one or more biological (e.g., antibody reactivity) and pharmacological (e.g., potency or duration of effect) 
but which substantially retain other such properties. 

Products of the present invention and/or antibodies thereto may be suitably "tagged", for example 
30 radiolabeled (e.g., with I 125 ) conjugated with enzymes or fluorescently labelled, to provide reagent materials 
useful in assays and/or diagnostic test kits, for the qualitative and/or quantitative determination of the 
presence of such products and/or said antibodies in fluid samples. Such anitbodies may be obtained from 
the innoculation of one or more animal species (e.g., mice rabbit, goat, human, etc.) or from monoclonal 
antibody sources. Any of such reagent materials may be used alone or in combination with a suitable 
35 substrate, e.g., coated on a glass or plastic particle bead. 

Numerous modifications and variations in the practice of the invention are expected to occur to those 
skilled in the art upon consideration of the foregoing illustrative examples. Consequently, the invention 
should be considered as limited only to the extent reflected by the appended claims. 

40 Claims 

Claims for the following Contracting States : BE, CH, DE, FR, GB, LI, LU, NL, SE 

1. A consensus leukocyte interferon protein predominantly including those amino acids which are 
common to all naturally-occurring human alpha IFN subtype sequences and including at one or more 

45 positions where there is no amino acid common to all subtypes an amino acid which predominantly 
occurs at that position and in no event including any amino acid residue which is not extant at that 
position in at least one naturally-occurring subtype. 

2. A consensus human ieukocyte interferon protein according to Claim 1 selected from the group 
50 consisting of: 
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(Arg 22 , Ala 76 , Asp 78 , Glu 79 , Tyr 88 , Tyr 90 , 
, Thr 158 , Asn 157 , Leu 158 ] 2FN-oF; 

[Thr 14 , Met 18 , Arg 22 , Ala 78 , Asp 78 , Glu 79 , 



86 



.90 



.96 



Tyr ww , Tyr'\ Leu 50 , Thr 158 , Asn 13 ', Leu 150 ] iFN-aF; and 



157 



.158. 



[Arg 22 , Ala 78 , Asp 78 . Glu 79 , Tyr 88 , Tyr 90 , 



96 



.114 



Leu^, ci„"\ Arg 121 , Thr i5b , Asn"', Leu 158 ) IFN-cF. 



156 



157 



.158, 



3. A gene capable of directing the synthesis in a selected host microorganism of consensus human 
leukocyte interferon protein according to Claim 1 or 2. 

Claims for the following Contracting State : AT 

1. Product produced by a microbiological process, comprising a consensus leukocyte interferon protein 
predominantly including those amino acids which are common to all naturally-occurring human alpha 
I FN subtype sequences and including at one or more positions where there is no amino acid common 
to all subtypes an amino acid which predominantly occurs at that position and in no event including any 
amino acid residue which is not extant at that position in at least one naturally-occurring subtype. 

2. Product produced by a microbiological process, comprising a consensus human leukocyte interferon 
protein according to Claim 1 selected from the group consisting of: 



.22 



lArg", Ala 78 , Asp 78 , Glu 79 , Tyr 88 , 
Leu^, Thr 156 , Asn 157 , Leu 158 j IF*-cF; 



.96 



90 
Tyr ?u , 



[Thr 14 , Met 18 , Arg 22 , Ala 78 , As? 78 , Glu 79 , 
Tyr 88 , Tyr 50 , Leu 98 , Thr 155 , As. 157 , l„"W«T; a 

[Arg 22 , Ala 76 , As? 78 , Glu 79 , Tyr 88 , Tyr 90 , 
Leu 98 , Arc 121 , Thr 156 , Asn 157 , Leu 158 ] XW-T . 



3. A gene (DNA molecule) produced by a microbiological process and capable of directing the synthesis 
in a selected host microorganism of consensus human leukocyte interferon protein according to Claim 
1 or 2. 

Patentanspruche 

Patentanspruche fur folgende Vertragsstaaten : BE, CH, DE, FR, GB, LI, LU, NL, SE 

1. Ein Consensus-Leukozyten-lnterferon-Protein, das uberwiegend diejenigen Aminosauren einschlieSt, die 
alien naturlich auftretenden Human-alpha-IFN-Subtypsequenzen gemeinsam sind, und das an einer 
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oder mehreren Positionen, an der es keine Aminosaure gibt, die alien Subtypen gemeinsam ist, eine 
Aminosaure einschlieBt, die uberwiegend an dieser Position auftritt, und das keinesfalls irgendeinen 
Aminosaurerest einschlieSt, der in dieser Position nicht in wenigstens einem naturiich auftretenden 
Subtyp vorhanden ist. 

2. Ein Consensus-Human-Leukozyten-lnterferon-Protein nach Anspruch 1, ausgewahit aus der Gruppe, 
bestehend aus: 



[Arg 22 . Ala 76 , Asp 78 . Glu 79 . Tyr 86 , Tyr 90 , 
Leu 96 . Thr 156 , Asn 157 . Leu 158 ] IFN-aF; 

[Thr 14 . Met 16 . Arg 22 . Ala 76 . Asp 78 . Glu 79 , 

_ 86 _ 90 T 96 156 _ 157 

Tyr , Tyr , Leu , Thr , Asn , 

ICO 

Leu ] IFN-aF; und 

[Arg 22 , Ala 76 , Asp 78 , Glu 79 , Tyr 86 . Tyr 90 , 
Leu 96 , Glu 114 , Arg 121 . Thr 156 , Asn 157 , 

ICO 

Leu ] IFN-aF . 



3. Ein Gen, das in der Lage ist, die Synthese von Consensus-Human-Leukozyten-lnterferon-Protein nach 
Anspruch 1 oder 2 in einem ausgewahlten Wirtsmikroorganismus zu steuem. 

Patentanspruche fUr folgenden Vertragsstaat : AT 

1. Mit einem mikrobiologischen Verfahren hergestelltes Produkt, das ein Consensus-Leukczyten-lnterfe- 
ron-Protein umfaBt, das uberwiegend diejenigen Aminosauren einschlieBt, die alien naturiich auftreten- 
den Human-alpha-IFN-Subtypsequenzen gemeinsam sind, und das an einer oder mehreren Positionen, 
an der es keine Aminosaure gibt, die alien Subtypen gemeinsam ist, eine Aminosaure einschlieBt, die 
uberwiegend an dieser Position auftritt, und das keinesfalls irgendeinen Aminosaurerest einschlieBt, der 
in dieser Position nicht in wenigstens einem natUrlich auftretenden Subtyp vorhanden ist. 

2. Mit einem mikrobiologischen Verfahren hergestelltes Produkt, das ein Consensus-Human-Leukozyten- 
Interferon-Protein umfaBt, nach Anspruch 1 , ausgewahit aus der Gruppe, bestehend aus: 



[Arg 22 . Ala 76 , Asp 78 , Glu 79 , Tyr 86 . Tyr 90 , 
Leu 96 . Thr 156 , Asn 157 , Leu 158 ] IFN-aF ; 

[Thr 14 , Met 16 , Arg 22 . Ala 76 , Asp 78 , Glu 79 . 

m 86 m 90 T 96 _. 156 . 157 
Tyr , Tyr , Leu , Thr , Asn 

Leu 158 ] IFN-aF; und 

[Arg 22 , Ala 76 . Asp 78 . Glu 79 , Tyr 86 , ' Tyr 90 , 
Leu 96 . Glu 114 , Arg 121 . Thr 156 . Asn 157 . 

I CO 

Leu DD ) IFN-aF . 



3, Ein Gen (DNA-Molekul), das hergestellt ist mit einem mikrobiologischen Verfahren und in der Lage ist, 
die Synthese von Consensus-Human-Leukozyten-lnterferon-Protein nach Anspruch 1 oder 2 in einem 
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ausgewahlten Wirtsmikroorganismus zu steuern. 



Revendications 

Revendications pour les Etats contractants suivants : BE, CH, DE, FR, GB, Li, LU, NL, SE 

1. Interferon leucocytaire consensus, incluant, de fagon predominate, les acides amines qui sont 
communs a toutes les sequences de sous-types IFN alpha humaines existant naturellement et inciuant, 
en une ou plusieurs positions ou il n'y a pas d'acide amine commun a tous les sous-types, un acide 
amine qui existe, de fagon predominante, en cette position et n'incluant en aucun cas un residu d'acide 
amine quelconque qui n'existe pas en cette position dans au moins un sous-type existant naturelle- 
ment. 

2. Interferon leucocytaire humain consensus selon la revendication 1 , choisi dans le groupe constitue de : 



[Arg 22 , Ala 76 , Asp 78 , Glu 79 , Tyr 86 , Tyr 90 , Leu 96 , Thr 156 , 
Asn 157 , Leu 158 ]IFN-aF ; 

[Thr 14 , Met 16 , Arg 22 , Ala 76 , Asp 78 , Glu 79 , Tyr 86 , Tyr 90 , 
Leu 96 , Thr 156 , Asn 157 , Leu 158 ]lFN-ccF ; et 

[Arg 22 , Ala 76 , Asp 78 , Glu 79 , Tyr 86 , Tyr 90 , Leu 96 , Glu 114 , 
Arg 121 , Thr 156 , Asn 157 , Leu 158 ] IFN-aF. 



3. Gene capable de diriger la synthese dans un micro-organisme hote choisi d'interfgron leucocytaire 
humain consensus selon la revendication 1 ou 2. 

Revendications pour TEtat contractant suivant : AT 

1. Produit obtenu par un proc£d§ microbiologique, comprenant un interferon leucocytaire consensus, 
incluant, de fagon predominante. les acides amines qui sont communs k toutes les sequences de sous- 
types IFN alpha humaines existant naturellement et incluant, en une ou plusieurs positions ou il n'y a 
pas d'acide amine commun a tous les sous-types, un acide amine qui existe, de fagon predominante, 
en cette position et n'incluant en aucun cas un residu d'acide amine quelconque qui n'existe pas en 
cette position dans au moins un sous-type existant naturellement. 

2. Produit obtenu par un procede microbiologique, comprenant un interferon leucocytaire humain consen- 
sus selon la revendication 1 , choisi dans le groupe constitue de : 
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Asp 78 , Glu 79 , 



Tyr 86 , Tyr 90 , Leu 96 , Thr 156 , 



[Arg 22 , Ala 76 , nat , , 
Asn 157 , Leu 158 ]IFN-aF ; 

[Thr 14 , Met 16 , Arg 22 , Ala 76 , Asp 78 , Glu 79 , 
Leu 96 , Thr 156 , Asn 157 , Leu 158 ] IFN-oP ; et 

[Arg 22 , Ala 76 , Asp 78 , Glu 79 , Tyr 86 , Tyr 90 , Leu 96 , Glu 
Arg 121 -, Thr 156 , Asn 157 , Leu 158 ]IFN-aF. 



Tyr , Tyr , 



114 



75 

3. Gene (molecule d'ADN) obtenu par un procede microblologique et capable de diriger la synthese dans 
un micro-organisme hote choisi d'interferon leucocytaire humain consensus selon la revendication 1 ou 
2. 
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